Using Options for Long-Horizon Off-Policy Evaluation

نویسندگان

  • Zhaohan Daniel Guo
  • Philip S. Thomas
  • Emma Brunskill
چکیده

Evaluating a policy by deploying it in the real world can be risky and costly. Off-policy evaluation (OPE) algorithms use historical data collected from running a previous policy to evaluate a new policy, which provides a means for evaluating a policy without requiring it to ever be deployed. Importance sampling is a popular OPE method because it is robust to partial observability and works with continuous states and actions. However, we show that the amount of historical data required by importance sampling can scale exponentially with the horizon of the problem: the number of sequential decisions that are made. We propose using policies over temporally extended actions, called options, to address this long-horizon problem. We show theoretically and experimentally that combining importance sampling with options-based policies can significantly improve performance for longhorizon problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Analysis of Off-policy Learning in Discrete MDPs

Abstract Off-policy evaluation is the problem of evaluating a decision-making policy using data collected under a different behaviour policy. While several methods are available for addressing off-policy evaluation, little work has been done on identifying the best methods. In this paper, we conduct an in-depth comparative study of several off-policy evaluation methods in non-bandit, finite-hor...

متن کامل

The Neural Network Modeling Approach for Long Range Expansion Policy of Power Plant Ccenters

Traditionally, Electrical power plant capacities are determined after specific plant locations have been selected. In this paper an expansion policy of power plant centers involving the choice of regions that must be allocated to power plant centers and power plant centers capacities over a specified planning horizon (years) is tackled. The problem is performed as a mixed integerprogramming mod...

متن کامل

Policy Capacity for Health Reform: Necessary but Insufficient; Comment on “Health Reform Requires Policy Capacity”

Forest and colleagues have persuasively made the case that policy capacity is a fundamental prerequisite to health reform. They offer a comprehensive life-cycle definition of policy capacity and stress that it involves much more than problem identification and option development. I would like to offer a Canadian perspective. If we define health reform as re-orienting the health system from acut...

متن کامل

Optimal Capital Structure with Sequential Options and Finite Horizon

A binomial lattice based framework for the analysis of finite investment options with finite operational phase is developed. Solutions for European and American type finite horizon investment options with optimal capital structure and a multi-stage investment setting with multiple debt issues are discussed. The analysis shows that optimal leverage ratios are not affected by option moneyness at ...

متن کامل

Review of Nutrition Policy Options for Increasing Fruit and Vegetable Consumption in the Populations: Lesson Learned and Policy Implications

Background: The development of policies for increasing fruit and vegetable consumption is highlighted as a priority in developing countries. This review study aimed to present the available policy options for increasing fruit and vegetable consumption in the populations. Methods: To collect relevant English publications, five electronic databases, including PubMed/Medline, Scopus, Embase, ProQu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1703.03453  شماره 

صفحات  -

تاریخ انتشار 2017